Evaluating extracted phrases and extending thesauri

نویسندگان

  • Gordon W. Paynter
  • Sally Jo Cunningham
  • Ian H. Witten
چکیده

We describe an interface that uses the phrases occurring in a document collection as a basis for browsing the collection and accessing its contents. Phrases are automatically extracted from the document text to represent the subject matter of the collection. Clearly, the interface’s utility depends on how good these phrases are. We evaluate the system by comparing the phrases extracted from a large Web site to those in a thesaurus used by the organization responsible for the site. This analysis serves two purposes: it aids the user by verifying that the phrases extracted are relevant to, and provide good coverage of, the subject areas of the Web site and thesaurus; and it aids the thesaurus compiler by identifying phrases in widespread use that do not appear in the thesaurus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report

The CLARIT NLP track e ort is focused on evaluating the usefulness of syntactic phrases for document indexing. The CLARIT system has several NLP techniques integrated with the vector space retrieval model [Evans et al. 91, Evans et al. 95]. The NLP techniques used in CLARIT include morphological analysis, robust noun-phrase parsing, and automatic construction of rst order thesauri, among others...

متن کامل

Automatic Extraction of Cue Phrases for Cross-Corpus Dialogue Act Classification

In this paper, we present an investigation into the use of cue phrases as a basis for dialogue act classification. We define what we mean by cue phrases, and describe how we extract them from a manually labelled corpus of dialogue. We describe one method of evaluating the usefulness of such cue phrases, by applying them directly as a classifier to unseen utterances. Once we have extracted cue p...

متن کامل

Toward Automatic Compilation of Phrasal Thesaurus

Thesaurus, which links between linguistic expressions (or concepts) based on various semantic relations, is one of the most fundamental semantic resources in a broad range of NLP tasks. A lot of work has been carried out relying on thesauri, such asWordNet (Miller, 1995) and automatically created versions of it. The entries of most existing thesauri are either single words or word sequences inc...

متن کامل

Thesaurus Extension Using Web Search Engines

Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of s...

متن کامل

Topic-specific Web Searching based on a Real-text Dictionary

The contributions of this paper are twofold. First, we present a new type of dictionary that is intended as a search assistance in topic-specific Web searching. The method to construct the dictionary is a general method that can be applied to any reasonable topic. The first implementation deals with climate change. The dictionary has the following new features compared to standard dictionaries ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000